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Abstract. In this paper, we revisit an auxiliary space preconditioning method 
proposed by Xu [Computing 56, 1996], in which low-order finite element spaces 
are employed as auxiliary spaces for solving linear algebraic systems arising 
from high-order finite element discretizations. We provide a new convergence 
rate estimate and parallel implementation of the proposed algorithm. We show 
that this method is user-friendly and can play an important role in a variety 
of Poisson-based solvers for more challenging problems such as the Navier- 
Stokes equation. We investigate the performance of the proposed algorithm 
using the Poisson equation and the Stokes equation on 3D unstructured grids. 
Numerical results demonstrate the advantages of the proposed algorithm in 
terms of efficiency, robustness, and parallel scalability. 

1. Introduction 

Iterative methods have been successfully applied to large-scale sparse linear sys- 
tems arising from discretizations of partial differential equations (PDEs). Many 
linear systems of equations can be handled by preconditioned Krylov subspace 
methods [USD]. In fact, preconditioners play a crucial role in the convergence of 
iterative methods. The construction of an "ideal" preconditioner depends on fol- 
lowing basic, sometimes contradictory, guidelines: (1) Optimality and Robustness: 
The convergence rate of an appropriate iterative methods on the preconditioned 
system is uniform or nearly uniform, independent of mesh size or physical param- 
eters; (2) Cost-effectiveness and Scalability: The computational costs and memory 
requirements of constructing and applying the preconditioning action are low and 
have good parallel scalability; and (3) User-friendliness: Users require little infor- 
mation and implementation is not difficult. 

The Poisson equation —Am = / and its variants arise in many applications. The 
geometric multigrid (GMG) method is one of the most efficient iterative methods 
for solving discrete Poisson or Poisson-like equations. A vast number of works have 
explored multigrid methods; references include the monographs and the survey 
papers [HJ EH [H 08]. Though the classical multigrid algorithm based on a geometric 
hierarchy can be an effective solver for a well-structured, it is usually very difficult 
to obtain such a hierarchy in practice. The algebraic multigrid (AMG) method [T51 
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H21 H3 SHI H31 [19] , on the other hand, requires minimal geometric information about 
the underlying problem and can sometimes be employed as a "black-box" iterative 
solver or preconditioner for other iterative methods. The version known as the 
Classical AMG [T5J [37] is used frequently and has been shown to be effective for 
a range of problems in practice. In an effort to render AMG methodologies more 
broadly applicable and to improve robustness, various versions have been developed; 
for example, see 01 [S5J [351 US] • 

AMG methods are readily applicable and potentially scalable for large 3D prob- 
lems. Recently, parallel versions of multigrid methods have attracted a lot of atten- 
tion (and will continue to do so) because of their fundamental role in modern com- 
putational mathematics and engineering; see [211 O SU [5] and references therein 
for details. In this paper, we will not discuss parallelization and implementation 
of AMG. Throughout this paper, we employ the Parallel Modified Independent Set 
(PMIS) coarsening strategy 45J and the Extend+i+cc interpolation [33] in Boomer- 
AMG of hypre package jl , which has been numerically proven to be efficient and 
scalable [5j. 

Although AMG methods have been proven effective for many problems, it is 
important to note that generally the performance of the Classical AMG method 
deteriorates for high-order finite elements (see Shu, Sun and Xu [33J for the 2D test 
examples). In Table [lj we show a simple numerical experiment. It is easy to see 
that for about the same degree of freedom (10 7 ) the convergence rate of the AMG 
method (PMIS and Extend+i+cc) deteriorates. Furthermore, the performance of 
AMG is very sensitive to the strength threshold 9 in the coarsening procedure (see 
^6] for details). On the other hand, it is clear that AMG can be also very effective 
for the discrete Poisson equations in relatively low order finite element spaces. 

Table 1 . Number of iterations for the AMG preconditioned GM- 
RES method. We solve the 3D Poisson equation with 64 pro- 
cessing cores (piecewise continuous Lagrangian finite element dis- 
cretizations are applied, the stopping criterion is when the relative 
residual is less than 10 -6 , and DOF is the total degree of freedom.) 



Element Type 


DOF 


9 = 0.25 


9 = 0.5 


9 = 0.7 


9 = 0.8 


9 = 0.9 


pl,0 


13M 


5 


5 


8 


10 


13 


p2,0 


13M 


6 


7 


10 


13 


17 


p3,0 


13M 


8 


10 


12 


15 


18 


p4,0 


12M 


>500 


>500 


17 


18 


22 



Studies have proposed using a two-level approach to handle Poisson equations 
on the higher-order finite element spaces. Such an approach would consist of (1) 
a smoother for the Poisson equation on the higher order finite element spaces, (2) 
transfer operators between the higher-order finite element spaces and the lower- 
order finite element spaces, and (3) an AMG method applied to the Poisson equa- 
tions defined on the lower-order finite element space. Shu, Sun and Xu |43j has 
designed an algebraic multigrid method by constructing lower order finite element 
coefficient matrices algebraically with the help of characteristics of Lagrangian finite 
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element spaces. Their study is restricted to the quadratic and cubic Lagrangian fi- 
nite element discretizations in 2D. Another attempt to use low-order finite element 
space for preconditioning can be found in Heys, et al. |26j . 

The solution technique to discrete Poisson equations is itself of great interest. 
However, even more compelling are Poisson-based solution techniques that can be 
applied to constructing an efficient solver for more complicated problems |53j . Over 
the last few decades, intensive research has been devoted to developing efficient lin- 
ear solvers for almost all kinds of sparse linear systems in scientific and engineering 
computing. The main idea of efficient preconditioning is to transform a seem- 
ingly intractable problem to a (sequence of) problem(s) that can be approximated 
rapidly. One such mathematical technique is a general framework called Auxiliary 
Space Preconditioning or ASP [53J IM] ■ This method represents a large class of 
preconditioners that (1) by using auxiliary spaces transform a complicated system 
into a sequence of simpler systems, and (2) construct efficient preconditioners with 
efficient solvers for these simpler systems. Based on fast Poisson solvers and an- 
alytic insight into PDEs or PDE systems, efficient solvers can be developed using 
the auxiliary space preconditioning framework for various cases that arise in practi- 
cal computations. Successful examples include simple and complex fluid problems, 
linear elasticity, and H(grad), H(div), and H{curl) systems with applications to 
the Maxwell equations [2"71 1531 13"3"] . 

In this paper, we revisit the algorithm proposed in [53] for solving a large-scale 
discrete second-order elliptic equations by high-order finite element methods. More- 
over, using easily available mesh information, we provide a parallel implementation 
of this auxiliary space preconditioner and analyze its performance for problems with 
about half a billion unknowns in terms of the robustness, efficiency, and scalabil- 
ity. This paper makes an additional contribution by providing an alternative proof 
for the convergence rate of the proposed algorithms. Lastly, the proposed method 
will be applied to solving the 3D Stokes equation on unstructured meshes. It is 
noteworthy that this proposed preconditioner is user-friendly and can improve the 
robustness, efficiency, and scalability of the solution to the Stokes equation com- 
pared with pure AMG methods. This indicates that the proposed method can also 
make a useful building block for other Poisson-based solvers. 

Throughout this paper, we will use the following notation. The symbol Lq 
denotes the space of all square integrable functions, L 2 , whose entries have zero 
mean values. Let H k be the standard Sobolev space of the scalar function whose 
weak derivatives up to order k are square integrable, and, let || • and | • \k denote 
the standard Sobolev norm and its corresponding seminorm on H k , respectively. 
Furthermore, || • \\k,u and | • \k lU i denote the norm || • and the semi-norm | • \k 
restricted to the domain u C fi, respectively. We use the notation X < (>)Y to 
denote the existence of a generic constant C, which depends only on f2, such that 
X < (>)CY. 

The rest of the paper is organized as follows. In Sj2j using the auxiliary space 
preconditioning framework, we present the construction of the geometric hierarchy 
and a two-level method for the Poisson equation from high-order finite element 
discretizations. In S|3]and Sj4j we analyze the convergence of the proposed two-level 
algorithm by casting it into the augmented matrix formulation by Griebel |23j . In 
S|5j we discuss the preconditioning techniques for saddle point problems from the 
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mixed finite element for the Stokes equation. In fj6j a number of numerical exper- 
iments are reported and summarized to demonstrate the efficiency and robustness 
of our parallel implementation. 

2. A GEOMETRIC-ALGEBRAIC MULTIGRID ALGORITHM 

This section is devoted to present the algebraic multigrid methods for the Poisson 
equations discretized by the higher order finite element methods with geometric 
hierarchy between higher order finite elements and the lower order finite element 
spaces consisting of piecewise linear elements. 

Let C K d be a bounded polyhedral domain. We consider the Poisson equation 

(1) - Au = f in Q, 

subject to zero Dirichlet boundary condition u = on dfl. We then consider the 
following weak formulation of ([!]): Find «eV= Hq(CI), such that 

(2) a(u,v) = (f,v) VveV, 
where 

(3) a(u, v) := / \7u-\7vdx and (f,v):= / / v dx Vu, v £ V. 

Jn Jn 

We now discretize the equation ([2| using the finite element method. To introduce 
the finite element spaces. Assume that Tk is a shape-regular triangular (tetrahedral) 
mesh of ft. For any T £ Th, let P k (T) be the set of polynomials on T of degree 
less than or equal to fc. We denote the piecewise continuous P k Lagrangian finite 
element space as Vh ■— P k <°. In this paper, Vh denotes a finite element space 
consisting of the fc-th (fc > 2) order piecewise continuous polynomials, such as 
quadratic, cubic or quartic polynomials. That is to say 

(4) V h := {v e C(Q) : v\ T € P k (T), \/Te%} = spanfai, . . . , <t> n J, 

where n h is the total number of degrees of freedom and {<^i}j=i,...,n h are the stan- 
dard fc-th order Lagrange basis functions. The discrete weak formulation of |2]) can 
be written as 

(5) a(u h ,v h ) = (f,Vh) Vv h eV h . 

We introduce an auxiliary space, the continuous piecewise linear polynomial 
space, 

(6) V H := {v G C(Q) : v\ T G P 1 (T), VT 6 %} = span{^i, . . . , ^ H }, 

where {tpj}j=i,...,n H are the canonical basis functions or the hat functions. We 
denote, by {x 1 } }i—i t „. t7l and {xf}^^,..^ , the set of evenly-spaced nodes where 
the degree of freedom (DOF) for the Lagrange finite spaces Vh and Vh are defined, 
respectively. Figure [l] shows the local ordering of x\ on a single simplex in 3D. 

Throughout this paper, we use the following convention for the vector representa- 
tion of a function in Vh- For any Vh = X)i=i v i4>i G Vh an d wh = J2i=i Wi^i € Vh, 
we denote by v/ t and w H , the vector representation of Vh and Wh, respectively. 
Namely, v h = (vi, . . . , v n/ ) T and = (wi, . . . , Wn H ) T . Similarly, the symbol f h 
denotes (/i, . . . , fn h ) T with fa = (f,<f>i) for all i = l,...,n h . The equation ^ 
can then be cast into the following equivalent finite-dimensional linear system of 
equations: Find u h such that 

(7) A h u h = f fc , 
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Figure 1. Local numbering of nodes x^'s in a single tetrahedron 
used for P2, P3 and P4 FEMs 



where Ah is a symmetric positive definite matrix with (Ah)ij = a(cj>j, (f>i). 

We are now at the position to present a geometry-based algebraic multigrid 
(GAMG) method by using the auxiliary linear finite element space. For this pur- 
pose, we start by introducing the transfer operators between V/j and Vh- Since 
k > 2, any basis function tpj € Vh can be represented by the basis functions 
{4'i}i<i<n l of the space Vh- Namely, for any j — l,...n H , there exists Cj = 
(c),...,cf h ) T e K"" such that 

(8) ^-=E4^- 

i=l 

These coefficients cj's can be easily obtained by A = 4'j{ x 'i) f° r anv i — 1> ■ • ■ , n h 
and j = 1,..., n H . Hence, we have that 

(9) ^= ^(*?)&> 

i S jv,f 0) 

where N^{j) = {i 6 {1, . .. ,n h } : supp(0i)nsupp(?Aj) ^ 0}- We define the transfer 
operators 

(10) l P = (ci,...,c„ H ) eR""*"" and | R = 

We note that the transfer operators Ir and lp can be generated easily with the 
mesh 7h available. An alternate (algebraic) approach for generating them has been 
introduced by Shu, Sun, and Xu [43] , 

Now, let A h and A H be the stiffness matrices defined by ^ on the finite clement 
spaces Vh and Vh, respectively. Then we have the Galerkin relation A^ = IrAJp. 
Let Gh be a smoother and we can state the following two-level algorithm: 

Algorithm 2.1 (A Two- level Method). Given an initial iterate, u° on the fine 
grid, we perform the following steps until convergence for I = 0, 1, . . . 
Step 1. Solve the coarse grid equation 

A H w H = l R (f h - A h uf ) 

Step 2. Correction 

S <> = u l + 'p w « 
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Step 3. Postsmoothing 

ui +1 = u h + G h (i h -A h n h ) 



Remark 2.1 (The Two-Level Method and GAMG). In Step 1 of Algorithm |2T 
we can employ an AMG method or a few AMG cycles to solve the coarse-level 
problem approximately. We can also add a presmoothing step in front of Step 1 to 
make the method symmetric. This particular version of Algorithm |2 . 1 1 will then be 
referred to as the geometric-algebraic multigrid (GAMG) method as easily available 
geometric information (mesh) is used in our implementation of Step 3. 

3. Augmented matrix formulation of the two-level method 



In this section, we introduce an equivalent form of Algorithm 2.1 in terms of the 
augmented algebraic systems based on so-called the redundant representation of 
the solution space. This observation was originally made by Griebel [23 , however, 
the convergence rate estimate in this framework has not been seen in literature. 
The main idea lies in the following redundant representation of the functions in the 
space Vh' For any v h e Vh, we have the representation 

(ii) v h = Vjipj + y] Vjfc. 

i=l i=l 

We notice that the above representation is not unique since the basis functions 



used in the representation are not independent, which is why (111 is also called 



the redundant representation of functions in Vh- Based on (11), we can consider 
the discrete weak formulation: Let Vh — spanj^i, . . . , ipn H , ■ ■ ■ , <^n } and find 
v h = Ya=i Vi4>i + Yh=i e Vh, such that 

(12) a(v h ,Wh) = (f,vih) Vwh€Vh- 

Let v = (vi, . . . ,v n ) T and v = (v\,. . . ,V n ) T . It is then easy to establish that the 



resulting system of equations from the aforementioned weak formulation (12 1 leads 
to the following augmented matrix systems [2"3] : 



Lemma 3.1. Let lp and Ir be the prolongation and restriction given in (10), re- 
spectively. Then the problem |5|) can be written as the following matrix equation 

/ 13 \ v \ _j , „ ( IrA^Ir l R A ft \( v\ ( lRf/, 



_ v J • V A,l ' p & h J \ v J \ fh 

Proof. From the following relation that 

(14) i/) i = '^24 < f ) j' i = l,...,n H , 

3=1 

we can deduce that 

(15) a(tpi, if>j) = (IrA/jIp)^- 
and 

(16) a(i/> ii <f> j ) = a\jr t c$<t>k, 0i) = a{4> k , fa) = (\ R A h )i 

fe=l k = l 
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We also note that 

(/, fr) = (/, £ cfyk) = J2 = l R f h . 

fc=l fc=l 

This completes the proof. □ 



There are a few interesting properties of the augmented matrix A given in ( 13 ) 
which will be useful in later sections. First of all, the matrix A is singular with 
positive diagonal entries. Moreover, the range and the null space of A denoted by 
1Z{A) = N 1 - and N = Af(A), respectively, can be characterized as follows: 

K{A) = { ( ' R V V " ) -v h e V h } and AT (A) = { ( ^ ) : c H € V H 

Let e& for k = 1, . . . , n H + n h be the canonical basis for E n « +n h . We denote the 
solution space of the augmented matrix system ( 13 ) as 



(17) V:= \v= ( I ) : v h =J^v% V v h € V h \ C 



pn„+n. 



It is worthy to note that Algorithm ^. 1| [J ean analyzed in the framework of Successive 
Subspace Corrections (SSC) [52] with the subspace decomposition 

V = V + Vi + --- + V nh , 

where Vo = span{ei, . . . ,e n } and Vj = span{e„ H +j} for j = 1, . . . , n h . In this 
setting, the SSC method can be written as follows: 

Algorithm 3.1 (Successive Subspace Correction Method). Let u° E V be given. 
for £= 1,2,... 

for k = 0, 1, 2, . . . , n h 

Find w k E V k : (Aw k ,v k ) = (f,Vkj - (Aulz\,Vk) Vu fc € V k 

endfor 

i 

u = u„ 
endfor 

The error transfer operator of the above algorithm can be identified as 
(18) £ = {1- V nh )(l - V nh - X ) ■■■(!- Vo), 

where I : V H> V is the identity matrix and Vj : V i-> Vj for j = 0, . . . , n h is the 
,4-projection onto the space Vj. More precisely, "Po : V i-> can be defined as 

/ u + (l R A h l P )- 1 l R A /l w \ (v + P v\ 

where Po = (IrA/Jp) _1 IrA^. For j = 1, . . . , n h , we define the projections 
?> := H e, = ' e» H+J - = (Pi(lp« + «) s e J )e nj+J) 



■*"We can also analyze the Algorithm 2.1 with presmoothing by modifying the space decomposition 
slightly to make it symmetric. 
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where ej for j = l,...,n h is the canonical basis for M, n >> and Pj is A/,-projection 
onto the space spanje.,} given by 

J (A h e j)ei ) J 



Remark 3.1 (Algorithm 2.1 and Block Gauss-Seidel Method). We apply the space 
decomposition 

(19) V = V H +V h , 
where 

V H = span {ex, . . . ,^n H ) and V h = span{e„ H+1 , . . . , e„ H +„ h }. 
We decompose of the matrix Ah in the following form 

(20) A h = D - L - L T , 

where D = (eijj)j = i n h is the diagonal part of A^ and L = (£ij)i,j=i...., n , with 

iij = for i > j and ty = — a(0j,0j) for i < j, i.e., the strictly lower triangular 
part of Ah- Similarly we can decompose A as follows 

(21) A = V-£~£ T , 
where 

-n f IrAJp \ / 

23 := { D J and C := { -A h \ P L 

In fact, we can easily show that the two-level method Algorithm |2.1| is equivalent 
to the Gauss-Siedel method for the augmented system of equations (13 1: 

(22) v l+1 =v l + {V-L)- 1 {}-Av l ) £ = 0,1,... 



4. Convergence rate estimate for the two-level method 



In this section, we establish a convergence rate estimate for Algorithm |2 . 1 1 using 
the formulation introduced in the previous section. We denote a semi inner product 

1 /2 

(■, -)_4 : V x V y-¥ R and the induced semi-norm by | • \^ = (-, -jj . We now establish 
a convergence rate identity for the error transfer operator (18). Note that relevant 
estimates have been reported in [301 132) . but we provide a proof for completeness. 

Theorem 4.1 (Convergence Rate Identity). The convergence rate for the iterative 
method (22) can be given by 

(23) 



1^ = 1- 



1 

K' 



whe 



K = 1 



sup Jnf 



(S(u + c),(u+Z)) 



(u,u) A i 
where S = CD^ 1 C T and uj G Vi for j = 0, 



sup inf 

: eA f± ceM 

...,n h . 
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Proof. Let B := (V - C)~ l . From Remark [ju] we have £ = X - BA. By the 
definition of the semi-norm \£\ A , we have the following identity 

lc , 2 {£u,£u) A {{X-BA)u,(X~BA)u) A 

((X-BA)*(X-BA)u, u). 

= SUp y^r ^, 

a e j^ ( u , W M 

where (X — BA)* is the adjoint operator of X — BA with respect to the semi inner 
product (-,-)a- 

We notice that (X — BA)* — X — B T A, where B T is the adjoint of B with respect 
to the usual I 2 inner product. Furthermore, since 

(X-BA)*(X-BA) = X - B T {B- T + B- 1 - A)BA 

= X-(B- 1 V- 1 B- T y 1 A = X-(A + S)- 1 A, 

we then have that 

1*11 = 1- inf M+^T^Sa. 

We shall now define M = A X I 2 {A + S)~ 1 AA 1 / 2 and obtain the following identity : 

(u,u) A {u,u) A 
K = sup j-— — = sup 



SeJ £. ((.4 + 5)-Uu, 5*)^ sg^L (4- 1 /2x^-V2^ 5)^ 



sup 



The last equality is from the fact that M : Af ± n- AT 1 - is symmetric and positive 
definite matrix and by replacing u by AA x l 2 u. Let Q : V H> A/"" 1 be the £ 2 -orthogonal 
projection, for which 

(24) AQv = Av VweV. 

We now write (A + 5) _1 4u = w + c(w), where w := Q(A + S)~ 1 Au E Af 1 - and 
c{w) £ Af. Note that c(w) is uniquely determined by w. Therefore, due to the fact 
that 

A~ 1/2 MA~ 1/2 u = (A + Sy 1 Au = w + c(w) and {A + S)(w + c(w)) = Au, 

we immediately obtain that 

{{w + c(w)), (A + S)(w + c(w)) . , ((A + S)(w + c),(w + c)) 

K = sup „ = sup inf 



>y the following reasoning: 
((A + S)(w + c),(w + c)) 



The last equality is obtained by the following reasoning: For a fixed w € Af , 
we assume that 



(25) £ = argjnf 
Then the first order optimality condition implies that £ satisfies 

(26) ((A + S)(w + £), c) =0 Vc e A/". 

This in turn implies that (4 + <S) (u> + £) € AT and £ = c(w) . This completes the 
proof. □ 
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To establish a uniform convergence rate of the proposed two-level method (Algo- 
rithm 2.1), we show that K in Theorem 



4.1 



can be bounded by a generic constant 
independent of mesh size. The standard prolongation Iff : Vh l— ^ Vh (the inclusion 
operator) and the restriction operator I H : Vh i— > Vh can be written as 



(27) 



)l P Wfj and I H v h = (ip 1 ,...,ip n )\ R v h 



We introduce two additional restriction operators: the usual L 2 projection Qh ■ 
Vh i-> Vh and the elliptic projection Ph '■ Vh i-> Vh denned by 

(Q H v,w H ) = (v,w H ) and a(P H v,w H ) = a(v,w H ) yv£V h ,w H £V H , 

respectively. It is clear that (A/Jpv H , Ipv^ ) = \\v H \\ 2 and the vector representation 
of Phv is simply IrA^v. Therefore, we have that, for any v £ Vh, the following 
inequality holds: 

(28) (AhlpA^lRAhV, IpA^lRAhv) = \\P H v\\l < \\v\\l 

We are now ready to prove the main theoretical result in this paper. Note that 
there are abundant literatures regarding the uniform convergence of two grid meth- 
ods. Unlike those reported in literatures such as |43j , our convergence estimate uses 
a different and novel technique, which is based on a convergence rate estimate for 
a singular system from the multilevel and redundant decomposition of the solution 
space. 

Theorem 4.2 (Uniform Convergence). The following estimate holds true 

{S(u + c), (u + c)) 



K = 1 



sup jnf 



sup jnf 



E> Pife?^ 



(u,u)a 

where Uj £ Vj for j = 0, 1, . . . 
Proof. Let u £ Af 1 - and c £ N be given by 



2 

A < 



sup 



n h such that Ylj=o u j = u + c - 



Qnv h \ 



1<1, 



(29) 



and c 



where v, £ 



From Theorem |4.1 1 we have the identity 



r"* -p. ( y n h 



K = sup jnf 

We note that the following identity holds true 
(30) {u,u) A =\\{\ + \p\*)v h \& =|| 



I%v 



H u h \\l> 



where v h is the vector representation of v h £ Vh in terms of the basis functions 
{4>i} i h l - Therefore, by choosing w H as the vector representation of v H — Qhv h , we 
obtain the following relation 



!rv„ + v f 



v,. - 



lpV t 
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and, in turn, 



E 

i=0 



E 

i=l 

E 



Pi 22 q i e . 



|lRV h +v H + P K-l P v H )||' 



where (ui, . . . ,w„ h ) T = v h — lp v ff ■ Next, we estimate the two parts in the above 
identity separately. 

It is straight-forward to observe that 

HPoK-lpvJUi^ = (P (v h -lpv ff ),P (v h -lpv if )) Aff 

= OiAK - l P v ff ),A- 1 l R A (i (v h - l P v H )) 

= (A.K-IpvJJpA-MrA.K-Ipv^)) 

< || v — Ipv„||a , due to the inequality (28 1. 

h 

Therefore, we obtain that 

||| R v h +v H +Po(v h -lpv ir )||' < lllplpv. + lpvji +||Po(v A -l P v H )||! 



(31) 



< HlplRV fc +vJ|i +||v,-l P v H ||i 

h n 

= \\lHV h +v h \\l + \\v h -Q H v h \\l 



Let Qi = supp 4>i and A^ = ((%■) with Oy = a(4>j,(f>i). Using the Cauchy-Schwarz 
inequality, we have the following estimate 



E 



= E ( (^l le f\(E^H A <>(E^) 

n h i n h 

= 'Yl,a{<t>i,<t>i)~ 1 a\<l>u'Y^'> 



3 t*3 



< 



E 



v E^ 



«E E h 

i=l ]£N k {i) 



where d is the dimension and Nk(i) = {j € {1, . . . ,n h } : ilj fill, / 0}. Norm 
equivalence leads to the following estimate that 

E h d u^<\\v h -Q H v h \\l at . 
jeN k (i) 
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Therefore, we arrive at the conclusion that 
2 



E 

i=l 



Ms* 



< h-t^K-QnvJl^Sh-^-QBvJl 



j=i 



(32) 



< \\(I-Q H )(v h +I h H v h )\\l 



By combining (31 1 and (32), we obtain the desired estimate for K, which completes 
the proof. □ 

5. Block preconditioners for the Stokes equation 

In this section, we consider efficient Poisson-based preconditioning techniques for 
the Stokes system with no-slip boundary condition: Find velocity u and pressure 
p, such that 



(33) 




f 






in 
in 

on dfl, 



where f2 C M. d (d = 2, 3) is a bounded polygonal domain and f is a given function. 

There have been extensive discussions on discretizations of the Stokes equation. 
In this paper, we shall focus on the mixed finite element discretizations; see, for 
example, 07H221[T7]. When solving this problem with mixed finite clement methods 
(MFEM) , a pair of discrete function spaces for velocity and pressure must be chosen 
carefully so that it is stable, i.e., satisfying so-called the inf-sup condition. There 
are a number of important classes of stable pairs. In particular, a family of Hood- 
Taylor finite elements |28j that approximates the velocity by continuous piecewise 
fc-th order polynomials (P fc '°) and the pressure by continuous piecewise (fc — 1)- 
th order polynomials (p fe_1 '°) with k > 2. is known to be stable for full three 
dimensional Stokes equation [7\. Another important stable elements shown only 
in 2D, is the Scott-Vogelius elements jH] with k > 4, which approximates the 
velocity by continuous piecewise fc-th order polynomials (P fc '°) and the pressure 
by discontinuous piecewise (k — l)-th order polynomials (P ' — 1 ). These pairs of 
mixed finite elements are very promising because they preserve the incompressibility 
condition, namely, discrete divergence free condition in the strong sense. 

Assume that the coefficient matrix arising in mixed finite element discretizations 



of ( 33 ) can be written as 



F 



A 
B 



B T 




Here, B is the discrete divergence operator, i.e., — Vj,«; and, A is the block diagonal 
matrix with the discrete Laplace matrix — A/j on its diagonal. Let £ u and x p be 
the unknown vectors of the velocity field and pressure, respectively. Then we need 
to solve the system of linear equations 



(34) 



F is symmetric and positive semidefinite and we can apply Krylov subspace 
methods for indefinite problems such as the minimal residual (MINRES) method [33] 



Fx = b 


or 


F 


x u 




b 








- %p - 
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and the generalized minimal residual (GMRES) method (41] to solve (34 1. It is well- 
known that the convergence rate of such methods is governed by three parameters: 
the condition numbers of A, B and the relative scaling between them. Generally, 
cond(A) = 0(h~ 2 ), therefore, a direct application of these methods will yield a 
slow convergence and the main difficulty in solving the Stokes equations lies in 
constructing "good" preconditioners for the elliptic operator A 7 which is typically 
given by the discrete elliptic operator with higher order finite elements. 

A lot of efforts have been devoted on solving the saddle point problems arising 
from mixed finite element methods for the Stokes equation; see (THJ, and 
references therein for details. A few efficient multigrid-type solution methods have 
been proposed for the Stokes equation [TH EJ E01 [8]. In this paper, we focus 
on the Poisson-based block diagonal and block triangular preconditioners [TOl EE] • 
Recently, parallel version of these preconditioning algorithms attracted a lot of 
interests due to their efficiency and easiness for implementation; see for example [341 
ETJ. 



Denote the Schur complement by S 
izations of F, 



BA 1 B T . The block triangular factor- 



A B T 
B 



(35) 



In 


° 1 




B T 


BA- 1 


Ip J 


u 


-s 


In 


° ) 







BA- 1 


I P J 


u 


-s 



A-^B 1 



motivate a block upper triangular preconditioner [10 

-l 



(36) 



Qi 



A B T 
-S 



and an even simpler block diagonal preconditioner 



(37) 



A 




-S 



In either Q t or Qd, it requires to obtain certain approximations to A~ 1 and S . 
Since A is a discretization of the Laplace operator, we form an approximation to 
A- 1 by applying one multilevel V-cycle to A. As for the Schur complement part, we 
approximate S with the pressure mass matrix M p and solve M p equation using the 
conjugate gradient method with diagonal preconditioner. We note that there are 



different ways to form the preconditioning actions based on (36) and (37); see [BJ. 



6. Numerical experiments 

In this section, we test the performance of the proposed Geometric-Algebraic 
Multigrid (GAMG) method (Algorithm 2.1 with presmoothing; see Remark 2.1) 
and compare it with the corresponding AMG method with focus on their robustness, 
efficiency, and parallel weak scalability. For this purpose, we use two simple test 
problems — one is the 3D Poisson equation and the other is the 3D Stokes equation — 
on a unit cube with unstructured tetrahedral meshes. We pay special attention to 
the performance of both methods for higher-order finite element discretizations. 
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6.1. Implementation. All numerical tests are carried out on the LSSC-H cluster 
at the State Key Laboratory of Scientific and Engineering Computing (LSEC), 
Chinese Academy of Sciences. The LSSC-IE cluster has 282 computing nodes: Each 
node has two Intel Quad Core Xeon X5550 2.66GHz processors and 24GB shared 
memory; all nodes are connected via Gigabit Ethernet and DDR InfiniBand. To 
make a fair comparison, we use zero right-hand side and start from a random 
initial guess in our tests. In this section, "^It" denotes the number of iterations, 
"DOF" denotes the degree of freedom, "CPU" denotes the computation wall time 
in seconds, and "RAM" denotes the memory usage per processor in MB. 

Our implementation is based on several open-source numerical packages. The 
finite element discretization for the Poisson equation and the Stokes equation is 
implemented using PHG [21 [S5] . PHG is a toolbox for developing parallel adaptive 
finite element programs on unstructured tetrahedral meshes and it is under active 
development at LSEC. PHG is also employed to build the two-level GAMG setting, 
namely to generate the transfer operators (prolongation and restriction). 

The solvers are implemented using PETSc [2] and BoomerAMG in hypre pQ. 
Due to limited space, we only report the results for the Flexible GMRES (FGM- 
RES) method [3S gO] in PETSc and BoomerAMG with the PMIS method g5] 
for coarsening, the Extendcd+i+cc method gl] for interpolation, and the hybrid 
Gauss-Seidel method for smoothing. This particular setting provides good efficiency 
and scalability for the linear systems in our numerical tests. For more numerical 
tests for various choices of iterative methods and different types of AMG methods, 
we refer to Lee, Leng and Zhang |3"T] . 

Remark 6.1 (Number of Smoothing Sweeps). In multigrid method, there is a 
trade-off in number of smoothing sweeps used in each cycle. More smoothing sweeps 
in one cycle will cost more computation time in each multigrid cycle but may reduce 
total number of cycles. Another consideration about the number of smoothing 
sweeps is that for linear system that is harder to solve, more smoothing sweeps 
might lead to better convergence rate. In this paper, we use only one pre and post 
smoothing sweep in our experiments. 

6.2. Test Problem 1 — the Poisson equation. The GAMG solver for the Pois- 
son equation is implemented as follows: We first pass the linear systems to the 
FGMRES iterative method of PETSc, then we use one multilevel V-cycle as the 
prcconditioncr for FGMRES. In each multilevel cycle, the coarse level problem (cor- 
responding to the -P 1,0 finite element space) is solved with BoomerAMG in hypre. 
The AMG preconditioner for the Poisson problem is simple, we pass the linear 
system to the FGMRES method of PETSc and employ BoomerAMG as a precon- 
ditioner. In both methods, the stopping criteria is that the relative residual is less 
than 1CT 6 . 

In Table [T] in SjlJ we have shown that the AMG method could be very sensitive 
to the strength threshold 9 when applied to the linear systems arising in higher 
order finite element discretizations. The strength threshold or strong threshold 
determines strength of connections, i.e., a point (variable) i strongly depends on j 
if 

—Oi j > Omax(-a it k)- 

k^i 

The default value of in BoomerAMG is 0.25, which usually works well for 2D 
Laplace operators and a larger value, like 0.5, is suggested for 3D Laplace operators. 
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But neither of them works for the P ' finite element discretization. Hence we start 
by testing both AMG and GAMG methods for various values of 9 and report CPU 
time and memory consumption in Table [2] [3j [4] 

We notice that: (1) The AMG method under consideration performs reasonably 
well even for high-order finite element methods. (2) However, it is very sensitive to 
the strength threshold 9 for the fourth order finite element method. (3) The GAMG 
method, on the other hand, is very robust with respect to 9. And, in general, 
it converges faster (from 1.5 times up to 30 times, see Table [5]) and consumes 
less memory (by 10% to 50%) compared with the AMG method. (4) For large 
3D linear systems arising from higher-order finite element discretizations, large 
strength threshold often works much better. In the rest of the comparisons, we will 
fix the parameter 9 = 0.8, which is the best choice for AMG, but not necessarily 
for GAMG. 

Table 2. Iteration number, CPU time, and memory usage of the 
AMG and GAMG preconditioned Krylov subspace method for the 
3D Poisson equation on unstructured tetrahedral mesh (P 2 '° finite 
element, about 430K DOF per processing core). 



Method (9) 


1 Core 


8 Cores 


64 Cores 


#It CPU RAM 


#It CPU RAM 


#It CPU RAM 


AMG (0.2) 


7 11.75 1141 


6 22.82 1424 


6 32.71 1320 


GAMG (0.2) 


7 4.16 790 


7 9.25 1170 


7 8.57 1072 


AMG (0.4) 


7 8.00 1124 


7 18.13 1428 


7 25.50 1294 


GAMG (0.4) 


7 4.16 790 


7 9.39 1170 


7 8.79 1073 


AMG (0.6) 


8 5.92 1080 


8 15.06 1381 


8 20.29 1255 


GAMG (0.6) 


7 4.09 790 


7 8.55 1174 


7 8.19 1074 


AMG (0.8) 


11 5.10 1022 


13 14.64 1280 


13 17.85 1187 


GAMG (0.8) 


9 4.35 790 


9 9.13 1168 


9 9.49 1072 



Table 3. Iteration number, CPU time, and memory usage of the 
AMG and GAMG preconditioned Krylov subspace method for the 
3D Poisson equation on unstructured tetrahedral mesh (P 3 >° finite 
element, about 650K DOF per processing core). 



Method {9) 


1 Core 


8 Cores 


64 Cores 


#It CPU RAM 


#It CPU RAM 


#It CPU RAM 


AMG (0.2) 


8 32.24 2345 


8 42.98 2133 


7 75.84 2419 


GAMG (0.2) 


12 13.14 1216 


12 14.11 1410 


11 19.46 1522 


AMG (0.4) 


9 23.94 2278 


8 35.15 2219 


8 65.24 2507 


GAMG (0.4) 


12 13.16 1217 


12 14.94 1412 


11 19.99 1523 


AMG (0.6) 


10 16.76 1929 


10 27.02 2067 


10 48.07 2291 


GAMG (0.6) 


12 13.12 1222 


12 14.86 1412 


11 20.21 1523 


AMG (0.8) 


13 13.74 1696 


14 23.29 1831 


15 38.54 2137 


GAMG (0.8) 


12 13.08 1216 


12 14.86 1411 


11 20.19 1525 
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Table 4. Iteration number, CPU time, and memory usage of the 
AMG and GAMG preconditioned Krylov subspace method for the 
3D Poisson equation on unstructured tetrahedral mesh (P 4 '° finite 
element, about 460K DOF per processing core). 



Method (6>) 


1 Core 


8 Cores 


64 Cores 


#It CPU RAM 


#It CPU RAM 


#It CPU RAM 


AMG (0.2) 


19 42.47 1986 


28 73.22 2043 


93 438.8 2352 


GAMG (0.2) 


16 16.45 1023 


18 20.03 1117 


17 25.77 1534 


AMG (0.4) 


16 22.12 1560 


36 57.40 1840 


>500 >800 2258 


GAMG (0.4) 


16 16.41 1023 


18 20.05 1117 


17 25.10 1535 


AMG (0.6) 


17 16.44 1679 


25 33.83 1683 


245 356.2 2085 


GAMG (0.6) 


16 16.42 1023 


18 19.07 1117 


17 25.64 1539 


AMG (0.8) 


17 13.08 1319 


19 22.79 1555 


19 36.50 1971 


GAMG (0.8) 


16 16.47 1023 


18 20.93 1118 


17 25.23 1534 



Table 5. Speedup of GAMG compared with AMG for solving the 
discrete Poisson equation with finite element methods on unstruc- 
tured tetrahedral meshes (solved using 64 processing cores). 



Finite Element DOF 


9 = 0.2 


9 = 0.4 


9 = 0.6 


9 = 0.8 


P 2 ' 27M 


3.8 


2.9 


2.5 


1.9 


p3,o 43M 


3.9 


3.3 


2.4 


1.9 


P 4 <° 31M 


17.0 


32.0 


13.9 


1.5 



Now, let L to be the number of levels in multilevel hierarchy and level 1 is the 
finest level. The operator complexity, C op := ^Zi=i finz{Ai)/ nnz(Ai), is the ra- 
tio between the total number of nonzeros (nnz) of all levels and the number of 
nonzeros of the finest level. The operator complexity is an important indicator of 
expense of multigrid type methods, not only for the storage requirements of mul- 
tilevel preconditioners, but also for computational complexity for applying them. 
In our experiments, we use the PMIS coarsening strategy and the Extended+i+cc 
interpolation method in both AMG and GAMG. As summarized in Table [6j we 
notice that GAMG action is much cheaper than AMG. In fact, for P 3,0 and P 4 ' , 
the operator complexity of GAMG is close to 1.0. This is due to the coarse level 
space, P 1 '°-finite element space, contains considerably less degree of freedom and 
gives much less number of nonzeros in the coefficient matrices. In Table [6] the 
P 1 '°-DOF column gives the degree of freedom for the corresponding coarse level. 

In the rest of this subsection, we consider weak scalability of the proposed GAMG 
method. The maximal number of processing cores is 1024 (on 128 nodes) and the 
maximal degree of freedom in our tests are about 5 x 10 8 . We notice that both the 
AMG and GAMG preconditioned Krylov subspace methods yield good optimality 
and scalability; see Figures [2j [3j and [3] We note that this comparison was done 
with the "good" parameter (9 — 0.8); otherwise, the performance of AMG will 
deteriorate quickly for P 4,0 finite clement. 
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Table 6. Operator complexities of AMG and GAMG (PMIS and 
Extendcd+i+cc) in single core tests. 



Finite Element 


DOF 


pi,o D OF 


c 


AMG 


P 1 '°-AMG 


GAMG 


pl.O 


429877 




2.00 






p2,Q 


435825 


59495 


1.67 


1.76 


1.12 


p3,0 


736608 


31087 


1.56 


1.96 


1.02 


pi.O 


460899 


8165 


1.36 


1.70 


1.01 




Figure 2. Parallel (weak) scalability of AMG and GAMG for P 2 > 
FEM for the Poisson equation. 




Figure 3. Parallel (weak) scalability of AMG and GAMG for P 3 >° 
FEM for the Poisson equation. 

6.3. Test Problem 2 — the Stokes equation. In this section, we consider the 
Hood- Taylor family mixed finite element methods for the steady Stokes flow on a 
3D lid driven cavity domain. Again, we test the AMG and GAMG methods with 
"good" strength threshold 9 = 0.8. We choose to stop the outer FGMRES iteration 
if the relative residual is smaller than 10 -8 . 

Similar to the Poisson solver described in §6.2[ the GAMG solver for Stokes test 
is implemented as follows: We pass the linear systems to the FGMRES solver in 
PETSc and we apply the block triangular preconditioner Q t described in ^5] for the 
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FGMRES method. The performance of block diagonal preconditioner Qd can be 
found in |31j . For the lower-right block (corresponding to the Schur complement), 
we solve it with the diagonal preconditioned PCG method in PETSc. For the 
upper- left blocks (corresponding to the Poisson equation), we approximate it with 
one GAMG V-cycle for the discrete Poisson equation. The AMG solver for Stokes 
test is similar except that the upper-left block was solved with one AMG V-cycle. 

From Figure [5j[6| and[7j we immediately notice that: (1) In general, the AMG 
preconditioner performs reasonably well, even for high-order elements. (2) However, 
the convergence rate of AMG deteriorates with the size of problems and with the 
order of the mixed finite element. (3) The GAMG preconditioner yields much 
better convergence rate as well as scalability, especially for higher order elements. 
In particular, the iteration number does not increase as DOF increases. 




Figure 5. Algorithm optimality and parallel (weak) scalability 
of AMG and GAMG preconditioned FGMRES methods for the 
p2,o _ pi,o Hood- Taylor mixed finite element. 



Remark 6.2 (Intermediate Approximation Spaces). In this paper, we only use 
two-level approximation with P 1 ' finite element space as the coarse level. One 
can imagine that intermediate (larger) auxiliary spaces could be used to improve 
performance. Since the convergence rate of the proposed two-level algorithm is 
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Figure 6. Algorithm optimality and parallel (weak) scalability 
of AMG and GAMG preconditioned FGMRES methods for the 
p3,o _ p2,a Hood- Taylor mixed finite element. 




Figure 7. Algorithm optimality and parallel (weak) scalability 
of AMG and GAMG preconditioned FGMRES methods for the 
p4,o _ p3,o Hood- Taylor mixed finite clement. 



optimal (does not deteriorate as size of the problem increases) in our experiments, 
we decide not to do it in order to keep the implementation as simple as possible. 



7. Conclusions 

In this paper, we investigate an auxiliary space preconditioning method for high- 
order finite element discretizations of the Laplace operator in 3D. Modern parallel 
AMG techniques like PMIS /Extended+i+cc give good parallel scalability, but only 
if their parameters like strong strength threshold are chosen appropriately. On the 
contrary, the proposed auxiliary space preconditioner is very robust with respect to 
coarsening parameters, especially when applied as a building block of Poisson-based 
solvers for the Stokes equation in 3D. Furthermore, the proposed method yields 
smaller operator complexity, which leads to less memory usage and computational 
complexity. 



20 



YOUNG-JU LEE, WEI LENG, AND CHEN-SONG ZHANG 



Acknowledgement 

The authors would like to thank Dr. Tzanio Kolev, Prof. Jinchao Xu, and 
Prof. Lin-Bo Zhang for their insightful comments and suggestions. Lee has been 
supported in part by NSF-DMS 091528 and Zhang has been supported in part by 
NSF-DMS 0915153. 



References 

HYPRE (High Performance Preconditioners) . http://www.llnl.gov/casc/linear solvers. 
PETSC (Portable, Extensible Toolkit for Scicntic Computation). http://www- 
unix.mcs.anl.gov/petsc. 

PHG (Parallel Hierarchical Grid), http://lsec.cc.ac.cn/phg/. 

O. Axelsson. Iterative solution methods. Cambridge University Press, Cambridge, 1994. 
A. H. Baker, R. D. Falgout, T. V. Kolev, and U. M. Yang. Scaling hypre's multigrid solvers 
to 100000 cores. In M. B. et Al., editor, High Performance Scientific Computing: Algorithms 
and Applications, volume 27344. 

M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta 
Numerica, 14:1-137, May 2005. 

D. Bom. Three-Dimensional Finite Element Methods for the Stokes problems. SI AM J. Nu- 
mer. Anal, 34:664-670, 1997. 

D. Braess and R. Sarazin. An efficient smoother for the Stokes problem. Applied Numerical 
Mathematics, 23(1):3-19, Feb. 1997. 

J. H. Bramble. Multigrid Methods, volume 294 of Pitman Research Notes in Mathematical 
Sciences. Longman Scientific & Technical, Essex, England, 1993. 

J. H. Bramble and J. E. Pasciak. A preconditioning technique for indefinite systems resulting 
from mixed approximations of elliptic problems. Math. Comp., 50(181):1-17, 1988. 
A. Brandt. Multigrid technigues: 1984 guide with applications to fluid dynamics. GMD- 
Studicn Nr. 85. Gcsellschaft fur Mathcmatik und Datcnvcrarbcitung, St. Augustin, 1984. 
A. Brandt. Algebraic multigrid theory: The symmetric case. Appl. Mathematics of Compu- 
tation., 19:23-56, 1986. 

A. Brandt. Multiscale scientific computation: Review 2001. T.J.Barth et.al eds. Springer. 
Springer, 2002. 

A. Brandt and N. Dinar. Multigrid Solutions to Elliptic Flow Prob- lems. In S. Parter, editor, 

Numerical Methods for Partial Differential Eguations, pages 53 147. 1979. 

A. Brandt, S. F. McCormick, and J. W. Ruge. Algebraic multigrid (AMG) for sparse matrix 
equations. In D. J. Evans, editor, Sparsity and Its Applications. Cambridge University Press, 
Cambridge, 1984. 

J. J. Brannick and R. D. Falgout. Compatible relaxation and coarsening in algebraic multigrid. 
SI AM J. Sci. Comput., 32(3):1393-1416, 2010. 

F. Brezzi and M. Fortin. Mixed and hybrid finite element methods. Springer- Verlag, 1991. 
H. Elman, D. Silvester, and A. Wathen. Finite elements and fast iterative solvers: with 
applications in incompressible fluid dynamics. Oxford University Press, USA, 2005. 
R. D. Falgout. An introduction to algebraic multigrid. Computing in Science and Engineering, 
8:24-33, 2006. 

R. D. Falgout, J. E. Jones, and U. Meier Yang. Pursuing scalability for hypre's conceptual 
interfaces. ACM Trans. Math. Software, 31(3):326-350, 2005. 

T. Geenen, M. ur Rehman, S. P. MacLachlan, G. Segal, C. Vuik, A. P. van den Berg, and 
W. Spakman. Scalable robust solvers for unstructured FE geodynamic modeling applications: 
Solving the Stokes equation for models with large localized viscosity contrasts. Geochemistry 
Geophysics Geosystems, 10(9):1-12, Sept. 2009. 

V. Girault and P. A. Raviart. Finite element methods for Navier-Stokes equations. Springer- 
Vcrlag, Berlin, 1986. Theory and algorithms. 

M. Griebcl. Multilevel algorithms considered as iterative methods on semidefinite systems. 
SIAM Journal on Scientific and Statistical Computing, 15:547-565, 1994. 



SCALABLE GAMG FOR POISSON 



21 



[24] M. Griebel, B. Metsch, D. Oeltz, and M. A. Schweitzer. Coarse grid classification: A parallel 
coarsening scheme for algebraic multigrid methods. Numerical Linear Algebra with Applica- 
tions, 13(2-3):193-214, 2006. Also available as SFB 611 preprint No. 225, Univcrsitat Bonn, 
2005. 

[25] W. Hackbusch. Multigrid Methods and Applications, volume 4 of Computational Mathemat- 
ics. Springer- Verlag, Berlin, 1985. 
[26] J. Heys, T. Mantcuffcl, S. McCormick, and L. Olson. Algebraic multigrid for higher-order 

finite elements. Journal of Computational Physics, 204(2) :520 - 532, 2005. 
[27] R. Hiptmair and J. Xu. Nodal auxiliary space preconditioning in H(curl) and H(div) spaces. 

SI AM J. Numer. Anal., 45(6):2483-2509 (electronic), 2007. 
[28] P. Hood and C. Taylor. A numerical solution of the Navier-Stokes equations using the finite 

element technique. Internat. J. Comput. & Fluids, 1:73-100, 1973. 
[29] M. Larin and A. Rcusken. A comparative study of efficient iterative solvers for generalized 

Stokes equations. Numer. Linear Algebra Appl., 15(l):13-34, 2008. 
[30] Y. Lee, J. Wu, J. Xu, and L. Zikatanov. A sharp convergence estimate of the method of 

subspacc corrections for singular systems. Mathematics of Computation, 77 (262):831-850, 

2008. 

[31] Y.-J. Lee, W. Leng, and C.-S. Zhang. Numerical study of a parallel geometric-algebraic 

multigrid method for the Stokes equation. In preparation. 
[32] Y.-J. Lee, J. Wu, J. Xu, and L. Zikatanov. Convergence analysis on iterative methods for 

semidefinite systems. J. Comp. Math., 26:797-815, 2008. 
[33] Y.-J. Lee, J. Xu, and C.-S. Zhang. Stable Finite Element Discretizations for Viscoclastic Flow 

Models. In Handbook of Numerical Analysis, volume XVI. 2011. 
[34] D. A. May and L. Morcsi. Preconditioned iterative methods for Stokes flow problems arising 

in computational gcodynamics. Physics of the Earth and Planetary Interiors, 171(l-4):33-47, 

Dec. 2008. 

[35] A. Muresan and Y. Notay. Analysis of aggregation-based multigrid. SIAM Journal on Sci- 
entific Computing, 30(2): 1082-1 103, 2008. 

[36] C. C. Paige and M. A. Saunders. Solutions of sparse indefinite systems of linear equations. 
SIAM J. Numer. Anal., 12(4):617-629, 1975. 

[37] J. W. Ruge and K. Stiiben. Algebraic multigrid, volume 3 of Frontiers Appl. Math., pages 
73-130. SIAM, Philadelphia, PA, 1987. 

[38] T. Rusten and R. Winther. A preconditioned iterative method for saddle point problems. 
SIAM J. Matrix Anal. Appl., 13(3):887-904, 1992. 

[39] Y. Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput., 
14:461-469, 1993. 

[40] Y. Saad. Iterative methods for sparse linear systems. Society for Industrial and Applied 

Mathematics, Philadelphia, PA, second edition, 2003. 
[41] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving 

nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7:856- 

869, 1986. 

[42] L. R. Scott and M. Vogelius. Conforming finite element methods for incompressible and 
nearly incompressible continua. In Large-scale computations in fluid mechanics, Part 2 (La 
Jolla, Calif., 1983), volume 22 of Lectures in Appl. Math., pages 221-244. Amer. Math. Soc, 
Providence, RI, 1985. 

[43] S. Shu, D. Sun, and J. Xu. An algebraic multigrid method for higher order finite element 

discretizations. Computing, 77(4):347-377, 2006. 
[44] H. D. Sterck, R. D. Falgout, J. W. Nolting, and U. M. Yang. Distance-two interpolation for 

parallel algebraic multigrid. Numerical Linear Algebra with Applications, 15(2-3):115-139, 

2008. 

[45] H. D. Sterck, U. M. Yang, and J. J. Heys. Reducing complexity in parallel algebraic multigrid 
prcconditioners. SIAM J. Matrix Anal. Appl, 27(4):1019-1039, 2006. 

[46] K. Stiiben. An introduction to algebraic multigrid. In U. Trottenbcrg, C. W. Oostcrlee, and 
A. Schiillcr, editors, Multigrid, pages 413-532. Academic Press, London, 2000. 

[47] R. Tcmam. Navier-Stokes Equations. North Holland, 1977. 

[48] U. Trottenberg, C. W. Oostcrlee, and A. Schiillcr. Multigrid. Academic Press Inc., San Diego, 
CA, 2001. With contributions by A. Brandt, P. Oswald and K. Stiiben. 



22 YOUNG-JU LEE, WEI LENG, AND CHEN-SONG ZHANG 

[49] P. Vanek, J. Mandel, and M. Brezina. Algebraic multigrid by smoothed aggregation for second 
and fourth order elliptic problems. Computing, 56:179-196, 1996. 

[50] S. P. Vanka. Block-implicit multigrid solution of Navier-Stokes equations in primitive vari- 
ables. J. Comput. Phys., 65(1): 138-158, 1986. 

[51] G. Wittum. Multi-Grid Methods for Stokes and Navier-Stokes Equations Transforming 
Smoothers: Algorithm and Numerical Results. Numerische Mathematik, 54:543-563, 1989. 

[52] J. Xu. Iterative methods by space decomposition and subspacc correction. SIAM Review, 
34:581-613, 1992. 

[53] J. Xu. The auxiliary space method and optimal multigrid preconditioning techniques for 
unstructured meshes. Computing, 56:215-235, 1996. 

[54] J. Xu. Fast poisson-based solvers for linear and nonlinear pdes. In R. Bhatia, editor, Proceed- 
ings of the international congress of mathematicians, volume 4. World Sci. Publ., 2010. 

[55] J. Xu and L. Zikatanov. On an energy minimazing basis in algebraic multigrid methods. 
Computing and visualization in sciences, 2004. submitted. 

[56] L.-B. Zhang. A Parallel Algorithm for Adaptive Local Refinement of Tctrahcdral Meshes 
Using Bisection. Numer. Math. Theory Methods Appl, 2:65-89, 2009. 



